-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] multipart S3 file uploads #2590
Conversation
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
@@ -66,6 +66,7 @@ rayon = "1.8.0" | |||
criterion = "0.3" | |||
random-port = "0.1.1" | |||
serial_test = "3.1.1" | |||
rand_xorshift = "0.3.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
insecure but much faster randomness for tests
@@ -161,13 +167,88 @@ impl S3Storage { | |||
} | |||
|
|||
pub(crate) async fn put_file(&self, key: &str, path: &str) -> Result<(), S3PutError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
#[tokio::test] | ||
#[cfg(CHROMA_KUBERNETES_INTEGRATION)] | ||
async fn test_put_file_scenarios() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to use proptest but apparently proptest doesn't work for async functions yet :/
I guess I could block_on
if that's preferred
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think some unit tests are good enough here
rust/worker/src/storage/s3.rs
Outdated
Ok(()) | ||
} | ||
|
||
fn part_number_offset_length_iter( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at first I was going to factor out more of the shared multipart upload logic and use an async closure parameter that returns a ByteStream given an offset and length
but was running into some lifetime/type issues so went with this simpler approach for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that would have been nice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
figured out how to do this, much cleaner and now I know a little more about async rust :)
e944960
to
8ab22c6
Compare
e772290
to
bd0e286
Compare
rust/worker/chroma_config.yaml
Outdated
@@ -28,6 +28,7 @@ query_service: | |||
credentials: "Minio" | |||
connect_timeout_ms: 5000 | |||
request_timeout_ms: 30000 # 1 minute | |||
part_size_bytes: 8388608 # 8MB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe "upload_part_size" is clearer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used upload_part_size_bytes
since it seemed like the convention was to include units, can drop them if you prefer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry meant upload_part_size_bytes yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we should put some time into seeing if we can upload without copying since our data to write can be many GB
S3 multipart limitations